Data access

ABSTRACT

A data access system is provided, including proxy servers ( 100 ) for caching “local” copies of selected data sets stored on data servers ( 115 ) to which users request access. Each proxy server ( 100 ) is adapted to generate ( 135 ) a subscription request message in respect of each identified cached data set for submission to a conventional “publish &amp; subscribe” data distribution system ( 110 ). The data distribution system ( 110 ) is arranged with access to “published” updates to data sets, made available from respective data servers ( 115 ). Upon receipt of a published updated data set having an identifier matching that in an earlier-received subscription request, the data distribution system ( 110 ) forwards the data set to the subscribing proxy server ( 100 ) to enable update to the respective cached copy. Thus, a proxy server ( 100 ), having decided to cache a particular data set, need only issue a subscription request message in order to receive all subsequent updates, as they become available, until choosing to remove the data set from the cache.

[0001] This invention relates to data access and in particular to maintaining integrity of cached copies of data.

[0002] The use of proxy servers to cache frequently accessed data sets is well known. Proxy servers may be provided to service a “local” community of users, storing (caching) local copies of frequently requested data sets that would otherwise need to be retrieved from their respective originating data sources every time a user requested access to them. Once a proxy server has stored a local copy of a particular data set, a subsequent request for access by a user to that data set is intercepted by the proxy server and access provided rapidly to the locally cached copy rather than to the originating source specified in the request.

[0003] A proxy server may include features to monitor user access requests and to select data sets for caching according to a predetermined selection algorithm. For example, a data set may be selected for caching if access to it was requested from three or more different users over a predetermined time period. A cached data set may be deleted from the cache if the time period between consecutive access requests exceeds a predetermined threshold.

[0004] A proxy server must ensure that any cached data sets remain up-to-date with respect to changes to the “original” data set held at the originating data source. To achieve this, known proxy servers use one or more of the following techniques:

[0005] (1) Periodic checking—once a data set has been cached, the proxy server submits periodic requests for access to the original source of the data set to determine whether amendments have been made. However, if the proxy server is to keep up to date with many cached data sets, a great deal of proxy server processing time and communications bandwidth is consumed if the period between requests is to be kept sufficiently short in order to avoid serving out-of-date data sets to users.

[0006] (2) Patterns associated with data being updated—the proxy server looks for patterns in the update of a data set and attempt to predict when it will next be amended. For example, if a data set has consistently been updated each morning at 6 am (e.g. a newspaper), then the proxy server may download a new copy of the data set from the corresponding source at say 6.01 every morning. However, it is not possible to be 100% accurate in predicting when a data set will be updated.

[0007] (3) Specified expiry time—a data set provider tags the data set with a ‘will be valid until . . . ’ message. The proxy server will not seek to refresh the cached copy until after that time. However, timely refresh of the cached copy depends upon the clocks between the proxy and data source being reasonably well aligned and upon the data set not expiring early. In practice short expiry periods are used, e.g. 1 hour.

[0008] (4) Update queries triggered by user access requests—every time a proxy server receives a request for access to a cached data set, the proxy server sends a message to the corresponding source of that data set asking “Has this data set been updated since xxxx”, where xxxx is a time or date. If it has, then a copy of the new data set is downloaded to the cache. While this is one of the most common modes of operation of proxy servers, it may add considerable time delay to the servicing of a request for access, and consumption of communications bandwidth in submitting an update query every time. This dramatically decreases the Quality of Service available to broadband users who expect far more rapid access to requested data sets.

[0009] Aspects of the present invention are as set out in the claims, individually or in combinations thereof.

[0010] Embodiments of the present invention can be used to provide, to a group of user interfaces, access to files which are cached locally but which are each automatically updated as soon as an update becomes available at a source file server. This is facilitated by using a publish and subscribe system to update the cached files. The publish and subscribe system in turn can be fed updates to the individual files by a source file server which is adapted to detect file updates and then use “push technology” to send the updates to the publish and subscribe system.

[0011] In at least a first aspect of the present invention, it is not necessary for a proxy (or caching) server to actively validate then update cached data sets once a decision has been made to cache a particular data set. Instead, the proxy server generates a subscription request message for sending to a predetermined node of a known “publish & subscribe” data distribution system, requesting receipt of any new “published” version of the particular identified data set. The data distribution system is arranged with access to published updates to data sets from corresponding data sources. If the data distribution system receives a data set having the same identifier as one subscribed to, the data set will be propagated through the data distribution system and delivered to the subscribing proxy server. On receipt, the proxy server overwrites the existing cached copy of the data set with the newly delivered version.

[0012] If a proxy server deletes a data set from its cache, the proxy server is arranged to generate an “unsubscribe” message for sending to the data distribution system to ensure that further updates to the data set are no longer delivered.

[0013] Preferably, data sources may be arranged to “publish” to the data distribution system a copy of each data set that is changed at that source. If the data distribution system has received a subscription request from a proxy server in respect of a changed data set received from a source, then the data distribution system delivers the changed data set to the subscribing proxy server. Preferably, a change monitoring process may be implemented at data sources to monitor changes to files and to “push” a copy of each changed file to a predetermined point of entry to the data distribution system. Any changed file that is not the subject of a subscription by a proxy server will not be propagated further by the data distribution system.

[0014] A “data set” or a “file”, in the context of the present invention, might in practice be any of several different types of electronically transmittable entities such as those accessible over the Internet, including for instance text, graphics, spreadsheets, computer programmes, audio, video, multimedia, or data. It should also be noted that a “user” may in practice be non-human, such as a machine or a piece of software.

[0015] There now follows, by way of example only, a description of a specific embodiment of the present invention. This description is to be read in conjunction with the accompanying drawing, in which:

[0016]FIG. 1 shows a block diagram of a data access system according to a preferred embodiment of the present invention; and

[0017]FIG. 2 shows a block diagram of a caching server for use in the data access system of FIG. 1.

[0018] Referring to FIG. 1, a diagram is provided showing in schematic form a data access system according to a preferred embodiment of the present invention. Two proxy servers 100 are shown arranged with access to forwarding computers 105 of a “publish & subscribe” data distribution system 110 comprising, in this example, a simple hierarchy of forwarding computers 105 (FC-1 to FC-6). Servers 115 representing sources of data sets are also shown arranged with access to forwarding computers 105 of the data distribution system 110.

[0019] Referring to FIG. 2, a proxy (or caching) server 100 includes: a user access request monitor 120 for intercepting and monitoring messages sent by client devices 200 containing requests for access to data sets stored on data servers 115; a cache (store) 125; and a data set selector 130 for selecting data sets for storage in the cache 125 according to a predetermined selection algorithm. A proxy server 100 also includes a subscription message generator 135 for generating and sending messages to the data distribution system 110 to subscribe on behalf of the proxy server to receive updates to specified data sets.

[0020] The subscription message generator 135 is arranged to generate subscription messages in a format acceptable to the data distribution system 110. A subscription message includes an address for the subscribing proxy server 100, an indication of whether the message is a “subscribe” or “unsubscribe” request message and a unique identifier for the data set for which updates are/were sought. Preferably, the data set selector 130 may be adapted to signal to the subscription message generator 135 the unique data set identifier of a data set for which a decision has been made to store a copy in the cache 125 and an indication that a “subscription” request message is to be generated. Similarly, upon deciding to remove a data set from the cache 125, the data set selector 130 may signal to the subscription message generator 135 to generate an “unsubscribe” message in respect of that data set.

[0021] In operation, proxy server 100, by means of the user access request monitor 120, intercepts a user's message requesting access to a data set stored on a specified data server 115. If the requested data set is stored in the cache 125, then the proxy server 100 services the access request by providing access to the cached copy of the requested data set, trapping the user's access request message. If the requested data set is not stored in the cache 125, then the user's access request message is forwarded to the specified data server 115 over a conventional communications network (not shown in FIG. 1). The user's access request is also copied to the data set selector 130 to determine whether or not a copy of the requested data set should be stored in the cache 125. Such a storage decision may be made for example for the benefit of providing faster subsequent user access to that data set. A conventional selection algorithm may be implemented by the data set selector 130, for example implementing selection criteria listed above in the introductory part of the present patent application.

[0022] If the data set selector 130 chooses to store the requested data set in the cache 125, proxy server 100 may be arranged to intercept a response from the specified data server 115 to the corresponding user's access request message and to copy the data set supplied in the response to the cache 125, thereafter forwarding the response to the client device for the requesting user. On selecting the data set for storage in the cache 125, the data set selector 130 is arranged to trigger the subscription message generator 135 to generate a subscription request message, supplying to the subscription message generator 135 a unique identifier for the selected data set for inclusion in the subscription request message, and to send the generated message to the “publish & subscribe” data distribution system 110, preferably to a predetermined forwarding computer 105 of that system.

[0023] On receipt of a subscription request message, the data distribution system 110 registers the request in a conventional way to ensure that a subsequently received data set having the identifier specified in the subscription request will be forwarded and delivered by forwarding computers 105 of the system to one or more subscribing proxy servers 100.

[0024] In order for updates to data sets stored on data servers 115 to be made available to the data distribution system 110 for delivery to subscribing proxy servers 100, a simple update monitoring module 140 may be installed on each data server 115 to detect changes to data sets stored in a store 145 and to “publish” each changed data set by forwarding a copy of the data set to a predetermined forwarding computer 105 in the data distribution system 110. An update monitoring module 140 may be implemented in the form of a computer program for installation on a conventional data server. An example of a listing for such a computer program is attached to the present patent application as Annex A. A computer program such as that in Annex A would be suitable for installation and integration into a conventional data server for monitoring directories of files in the data server store 145 and for sending a copy of each file in which a change is detected to a predetermined destination.

[0025] On receipt of an updated copy of a cached data set, proxy server 100 is arranged to store the updated copy in the cache 125, overwriting the previously stored copy.

[0026] Of course, updates to data sets stored on data servers 115 may be made available by third party sources and “published” to the data distribution system 110 for delivery to subscribing proxy servers 100.

[0027] The preferred embodiment of the present invention as described above may be implemented to operate in the context of the Internet and World-wide Web. A conventional web proxy server, such as the ‘Squid’ server (or ‘Cuttlefish’ derivative), may be adapted to receive and install a subscription message generation module 135 and to enable updates to cached data sets to be written to the cache 125 upon receipt. A skilled person would be readily able to implement proxy server features of the present invention on a conventional proxy server.

[0028] The present invention may also be applied to Wireless Application Protocol (WAP) proxy servers. WAP proxy servers operate in a similar way to web servers employing the Hypertext Transfer Protocol (HTTP) to access data sets. 

1. A file server for retrieving and transmitting files in response to received requests, in a communications network, the file server comprising: i) a request input for receiving file requests from client equipment; ii) request monitoring means for monitoring received file requests; iii) a subscription request output; iv) file retrieval means for retrieving a file identified in a received file request from a first location and transmitting it to the client equipment making the request; and v) a cache for storing local copies of files for which file requests have been received; wherein the request monitoring means is arranged to monitor received file requests, and, in the event of a predetermined condition being met for a file identified by at least one request, to trigger the subscription request output to output a subscription request for the identified file to a second location.
 2. A file server according to claim 1 wherein the predetermined condition comprises a threshold number of requests for the identified file.
 3. A file server according to claim 1 wherein the predetermined condition comprises a threshold rate of requests for the identified file.
 4. A data access system for providing updated files in response to received file requests, the system comprising: i) a subscription service file server which provides a subscription service to at least one caching file server, the subscription service file server being provided with a log for logging subscription data against respective caching file servers in respect of selected files or categories of files; and ii) access to a source file server which provides updates to files to the subscription service file server; said subscription service file server being triggerable, on receipt of an update to a file, to send a copy of the update to each caching file server logged in respect of that file.
 5. A system according to claim 4 wherein said log is arranged to log caching file servers against individual file identifiers.
 6. A system according to claim 4 wherein said log is arranged to log caching file servers against file categories.
 7. A data access system for use in providing updated files in response to received file requests, the system comprising a source file server which provides a file updating service to at least one destination file server, the source file server being provided with: i) update detection means for monitoring one or more files stored by the source file server to detect an update thereto; ii) a destination server log for logging destination information with respect to one or more destination file servers; iii) file selection means for selecting files to be monitored by the update detection means; and means triggerable by a detected update to a file to output a copy of the update to any destination file server having destination information logged in respect of it.
 8. A data access system according to claims 1, 4 and
 7. 